How Verb Subcategorization Frequencies Are Affected By Corpus Choice
نویسندگان
چکیده
The probabilistic relation between verbs and their arguments plays an important role in modern statistical parsers and supertaggers, and in psychological theories of language processing. But these probabilities are computed in very different ways by the two sets of researchers. Computational linguists compute verb subcategorization probabilities from large corpora while psycholinguists compute them from psychological studies (sentence production and completion tasks). Recent studies have found differences between corpus frequencies and psycholinguistic measures. We analyze subcategorization frequencies from four different corpora: psychological sentence production data (Connine et al. 1984), written text (Brown and WSJ), and telephone conversation data (Switchboard). We find two different sources for the differences. Discourse influence is a result of how verb use is affected by different discourse types such as narrative, connected discourse, and single sentence productions. Semantic influence is a result of different corpora using different senses of verbs, which have different subcategorization frequencies. We conclude that verb sense and discourse type play an important role in the frequencies observed in different experimental and corpus based sources of verb subcategorization frequencies.
منابع مشابه
Verb Sense and Verb Subcategorization Probabilities
Roland, Douglas William (Ph.D., Linguistics) Verb Sense and Verb Subcategorization Probabilities Thesis directed by Associate Professor Daniel S. Jurafsky This dissertation investigates a variety of problems in psycholinguistics and computational linguistics caused by the differences in verb subcategorization probabilities found between various corpora and experimental data sets. For psycholing...
متن کاملVerb Subcategorization Frequency Differences Between Business-News And Balanced Corpora: The Role Of Verb Sense
We explore the differences in verb subcategorization frequencies across several corpora in an effort to obtain stable cross corpus subcategorization probabilities for use in norming psychological experiments. For the 64 single sense verbs we looked at, subcategorization preferences were remarkably stable between British and American corpora, and between balanced corpora and financial news corpo...
متن کاملVerb subcategorization frequencies: American English corpus data, methodological studies, and cross-corpus comparisons.
Verb subcategorization frequencies (verb biases) have been widely studied in psycholinguistics and play an important role in human sentence processing. Yet available resources on subcategorization frequencies suffer from limited coverage, limited ecological validity, and divergent coding criteria. Prior estimates of verb transitivity, for example, vary widely with corpus size, coverage, and cod...
متن کاملThe Automatic Acquisition Of Frequencies Of Verb Subcategorization Frames From Tagged Corpora
We describe a mechanism for automatically acquiring verb subcategorization frames and their frequencies in a large corpus. A tagged corpus is first partially parsed to identify noun phrases and then a finear grammar is used to estimate the appropriate subcategorization frame for each verb token in the corpus. In an experiment involving the identification of six fixed subcategorization frames, o...
متن کاملLearning Probabilistic Subcategorization Preference by Identifying Case Dependencies and Optimal Noun Class Generalization Level
This paper proposes a novel method of learning probabilistic subcategorization preference. In the method, for the purpose of coping with the ambiguities of case dependencies and noun class generalization of argument/adjunct nouns, we introduce a data structure which represents a tuple of independent partial subcategorization frames. Each collocation of a verb and argument/adjunct nouns is assum...
متن کامل